Near Linear Time Detection of Distance-Based Outliers and Applications to Security
نویسندگان
چکیده
Many automated systems for detecting threats are based on matching a new database record to known attack types. However, this approach can only spot known threats and thus researchers have also be gun to use unsupervised approaches based on de tecting outliers or anomalous examples. A popular method of finding these outliers is to use the dis tance to an example’s k nearest neighbors as a mea sure of unusualness. However, existing algorithms for finding distance-based outliers have poor scaling properties, making it difficult to apply them to large datasets typically available in security domains. In this paper, we propose modifications to a simple, but quadratic, algorithm for finding distance-based out liers, and show that it achieves near linear time scal ing allowing it to be applied to real data sets with millions of examples and many features.
منابع مشابه
Detection of Outliers and Influential Observations in Linear Ridge Measurement Error Models with Stochastic Linear Restrictions
The aim of this paper is to propose some diagnostic methods in linear ridge measurement error models with stochastic linear restrictions using the corrected likelihood. Based on the bias-corrected estimation of model parameters, diagnostic measures are developed to identify outlying and influential observations. In addition, we derive the corrected score test statistic for outliers detection ba...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملApplication of Recursive Least Squares to Efficient Blunder Detection in Linear Models
In many geodetic applications a large number of observations are being measured to estimate the unknown parameters. The unbiasedness property of the estimated parameters is only ensured if there is no bias (e.g. systematic effect) or falsifying observations, which are also known as outliers. One of the most important steps towards obtaining a coherent analysis for the parameter estimation is th...
متن کاملA robust wavelet based profile monitoring and change point detection using S-estimator and clustering
Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کامل